Sleep Quality Estimation using Accelerometer Data from Thigh-Mounted Devices During in Free Living Conditions

(working title)

Esben Lykke, PhD student

31 marts, 2023

Background

Background

  • Sleep plays a vital role in health, thus, improving the assessment of sleep–wake outside of a laboratory environment is critical
  • The gold standard (PSG) is costly and inconvenient.
  • Methods for estimating sleep/wake based on accelerometry exist, primarily from wrist-worn devices
  • Cole-Kripke and Sadeh algorithms are commonly used
  • determine in-bed time is difficult, usually set by sleep log and/or human scorers
  • detect wakefulness is difficult, worse performance in populations with sleep disorders
  • typically a two level analysis: epoch based and summarized across night(s)
  • Zmachine-derived sleep stats
  • Purpose…

But Esben, what about them sleep stages!?

  • I did free-living PSG recordings of sleep but…
    • Super fragile -> shitty data
    • Combersome and time consuming
    • free-living when wired up like a robot?
    • would surface skin temperature + acc be enough? Most likely needs HR

It was likely a dead end from the get-go :(

Methods

Methods

  • data preparation, big time-consumer is handling raw acc data
  • only thigh data used. HSBC and other is only thigh data…
  • all zm recording is considered as in-bed (sensor problem?)
  • no sleep stages, only sleep/awake
  • sensor problems during sleep, up to 20 consecutive epochs (200 sec) are treated as sleep
  • Combine binary classifiers to produce multilabel outcomes (i.e., binary Relevance)
  • Evaluated combined classifiers using weighted micro-averaging.
  • adjust threshold for in bed awake or give edge in tie breakers??
  • zm sleep stats inclusion criteria
  • three approaches: tw- binary (risk of ghost classes), binary relevance (choosing the right tie breaker), and multiclass

Exclusion Criteria

Features

Basic Features

  • Weekday
  • Time of Day
  • Placement
  • Temperature

ACC derived features1

  • Mean ACC X
  • Mean ACC Y
  • Mean ACC Z
  • Standard Deviation X
  • Standard Deviation Y
  • Standard Deviation Z
  • Max Standard Deviation
  • Inclination

Sensor-Independent Features2

  • Clock Proxy Linear
  • Clock Proxy Cosinus

Human Circadian Clock

Forger, Jewett, and Kronauer (1999): a so-called cubic van der Pol equation

\[\frac{dx_c}{dt}=\frac{\pi}{12}\begin{cases}\mu(x_c-\frac{4x^3}{3})-x\begin{bmatrix}(\frac{24}{0.99669\tau_x})^2+kB\end{bmatrix}\end{cases}\]

This thing is dependent on ambient light and body temperature!

Walch et al. (2019) incorporated this feature using step counts from the Apple Watch

But as demonstrated by Walch et al. (2019), a simple cosine function does the trick just as well :)

Circadian Proxy Features

Circadian Proxy Features

Present three modeling approaches

  • Multiclass classifier with three labels: in-bed-asleep, in-bed-awake, and out-bed-awake
    • pros: simplicity, speed, joint modeling, consistency across classes
    • cons: struggle with class imbalance, performance
  • Two binary classifiers: in-bed/out-bed and asleep/awake
    • pros:
    • cons: ghost classes
  • Three binary classifiers: in-bed-asleep 1/0, in-bed-awake 1/0, and out-bed-awake 1/0
    • pros: more flexible, can better handle class imbalance
    • cons: complex training, low speed, feature redundancy

building Models

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Estimate Sleep Quality Metrics

Results

Epoch-Based

  • Performance Metrics
    • F1 Score
    • Accuracy
    • Sensitivity
    • Specificity
    • ROC curves

Summarized across nights

  • Agreement With Zmachine Sleep Stats
    • Sleep Period Time
    • Total Sleep Time
    • Sleep Efficiency
    • Latency Until Persistent Sleep
    • Wake After Sleep Onset
  • Minimal Detectable Change

Many ROC Curves

Lots of Metrics

 

Performance of the multiclass classifiers
Decision Tree Decision Tree SMOTE Logistic Regression Neural Network XGboost
F1 Score 87.95% 87.00% 79.40% 91.97% 88.20%
Accuracy 89.49% 86.06% 78.22% 88.99% 89.47%
Sensitivity 89.49% 86.06% 78.22% 88.99% 89.47%
Precision 88.19% 88.16% 80.38% 89.30% 87.99%
Specificity 91.55% 93.66% 70.58% 90.66% 91.39%
Weighted macro average. Metrics are calculated by taking the average of the scores for each class, weighted by the number of samples in each class.

 

Performance of the two binary classifiers
Decision Tree Logistic Regression Neural Network XGboost
In-bed Prediction
F1 Score 92.35% 88.77% 92.39% 92.41%
Accuracy 93.79% 91.48% 93.86% 93.82%
Sensitivity 91.85% 82.42% 91.23% 92.17%
Precision 92.87% 96.19% 93.58% 92.66%
Specificity 95.13% 97.74% 95.68% 94.96%
Sleep Prediction
F1 Score 87.74% 84.56% 88.12% 88.39%
Accuracy 91.11% 89.78% 91.51% 91.67%
Sensitivity 91.98% 80.90% 90.98% 91.71%
Precision 83.87% 88.57% 85.43% 85.30%
Specificity 90.64% 94.48% 91.80% 91.65%

Performance of the binary relevance classifiers
Decision Tree Logistic Regression Neural Network XGboost
In-bed Asleep Prediction
F1 Score 87.68% 84.56% 88.19% 88.44%
Accuracy 91.06% 89.78% 91.50% 91.71%
Sensitivity 92.04% 80.90% 91.70% 91.69%
Precision 83.72% 88.57% 84.94% 85.42%
Specificity 90.54% 94.48% 91.40% 91.73%
In-Bed Awake Prediction
F1 Score 22.22% 0.00% 20.15% 25.48%
Accuracy 69.68% 93.74% 94.04% 93.48%
Sensitivity 69.19% 0.00% 12.00% 17.79%
Precision 13.24% 0.00% 62.77% 44.87%
Specificity 69.71% 100.00% 99.52% 98.54%
Out-Bed Awake Prediction
F1 Score 94.79% 93.14% 94.88% 94.81%
Accuracy 93.87% 91.48% 93.90% 93.85%
Sensitivity 94.41% 97.74% 95.49% 94.96%
Precision 95.18% 88.95% 94.27% 94.66%
Specificity 93.07% 82.42% 91.60% 92.24%

Bland-Altman Plots

Bland-Altman Analysis

Bias (95% CI) Lower LOA (95% CI) Upper LOA (95% CI)
Sleep Period Time (hrs)
Decision Tree -0.16 (-0.29; -0.05) -2.72 (-3.07; -2.41) 2.4 (1.97; 2.85)
Logistic Regression -1.12 (-1.26; -0.99) -3.92 (-4.27; -3.64) 1.68 (1.33; 2.09)
Neural Net -0.34 (-0.47; -0.24) -2.68 (-3.05; -2.43) 2 (1.67; 2.45)
XGboost -0.17 (-0.29; -0.06) -2.7 (-3.09; -2.43) 2.35 (2; 2.82)
Total Sleep Time (hrs)
Decision Tree -0.09 (-0.2; 0.02) -2.43 (-2.68; -2.17) 2.25 (2.02; 2.52)
Logistic Regression -0.62 (-0.73; -0.52) -2.98 (-3.21; -2.77) 1.73 (1.51; 2)
Neural Net -0.14 (-0.26; -0.04) -2.42 (-2.73; -2.15) 2.14 (1.91; 2.39)
XGboost -0.1 (-0.2; 0.01) -2.38 (-2.65; -2.14) 2.18 (1.96; 2.45)
Sleep Efficiency (%)
Decision Tree 1.52 (0.66; 2.4) -17.42 (-19.28; -15.38) 20.46 (19.18; 21.9)
Logistic Regression 4.49 (3.76; 5.17) -10.44 (-12.57; -8.61) 19.42 (18.11; 21.09)
Neural Net 2.46 (1.67; 3.32) -14.41 (-16.44; -12.54) 19.33 (18.01; 20.79)
XGboost 1.34 (0.55; 2.36) -17.01 (-19.05; -14.76) 19.68 (18.31; 21.17)
Latency Until Persistent Sleep (min)
Decision Tree -1.02 (-3.41; 1.56) -52.44 (-57.7; -46.72) 50.41 (44.24; 56.4)
Logistic Regression 3.25 (0.98; 5.76) -49.99 (-56; -44.4) 56.5 (52.05; 61.69)
Neural Net -5.98 (-8.33; -3.8) -55.58 (-61.06; -50.5) 43.61 (38.75; 48.74)
XGboost -0.92 (-3.35; 1.38) -53.88 (-59.62; -48.52) 52.04 (46.84; 57.8)
Wake After Sleep onset (min)
Decision Tree 2.16 (-0.79; 5.12) -64.25 (-69.41; -58.52) 68.57 (61.94; 75.05)
Logistic Regression -8.84 (-11.28; -6.58) -62.9 (-68.06; -57.96) 45.22 (40.89; 50.25)
Neural Net -5.05 (-7.47; -2.51) -60.07 (-65.4; -55.64) 49.96 (45.29; 55.95)
XGboost -1.58 (-4.41; 1.12) -61.48 (-67.45; -56.22) 58.31 (52.88; 64.23)
Bootstrapped mixed effects limits of agreement with multiple observations per subject (Parker et al. 2016)

Descriptives of Sleep Quality Statistics Across Methods

Sleep Period Time Total Sleep Time Sleep Efficiency Latency Until Persistent Sleep Wake After Sleep Onset
mean (SD)1 r (CI95%)2 mean (SD)1 r (CI95%)2 mean (SD)1 r (CI95%)2 mean (SD)1 r (CI95%)2 mean (SD)1 r (CI95%)2
ZMachine Insight+ 9.7 (1.03) - 8.17 (1.04) - 84.66 (5.47) - 34.06 (22.25) - 38.07 (24.25) -
Decision Tree 9.47 (1.14) 0.31 (0.21 - 0.41) 8.08 (1.29) 0.31 (0.2 - 0.41) 86.45 (8.25) 0.01 (-0.1 - 0.13) 31.32 (17.59) 0.23 (0.12 - 0.34) 38.38 (25.85) -0.01 (-0.13 - 0.1)
Logistic Regression 8.53 (1.27) 0.21 (0.09 - 0.32) 7.53 (1.2) 0.23 (0.11 - 0.33) 89.3 (6.47) 0.17 (0.05 - 0.28) 36.99 (16.41) 0.04 (-0.08 - 0.16) 27.85 (17.66) 0.16 (0.04 - 0.27)
Neural Network 9.3 (1.07) 0.34 (0.24 - 0.44) 8.02 (1.32) 0.39 (0.29 - 0.48) 87.29 (8.01) 0.16 (0.04 - 0.27) 27.18 (17.61) 0.23 (0.12 - 0.34) 32.47 (21.52) 0.1 (-0.02 - 0.21)
XGBoost 9.44 (1.1) 0.31 (0.2 - 0.41) 8.06 (1.3) 0.36 (0.26 - 0.45) 86.29 (8.68) 0.15 (0.03 - 0.26) 32.25 (17.44) 0.22 (0.11 - 0.33) 35.63 (21.29) -0.02 (-0.13 - 0.1)
1 Sleep outcome means and standard deviations.
2 Repeated measures correlation coefficient between outcomes and ZMachine Insight+ and corresponding 95% confidence intervals.

In-bed classification flow

Sleep classification flow

Discussion

Discussion

  • heteroscedasticity
  • Cheung 2018 table 4: actigraphy provides a sufficiently narrow range of possible mean differences (CI 95%) clinical significant thresholds
  • could be interesting to build models on thigh and hip ocmbined.
  • multiclass vs multilabel classification
  • in-bed awake/sleep is highly imbalanced -> maybe train a new classifier accounting for imbalanced data (SMOTE)
  • model combined preds instead?

References

Forger, D. B., M. E. Jewett, and R. E. Kronauer. 1999. “A Simpler Model of the Human Circadian Pacemaker.” Journal of Biological Rhythms 14 (6): 532–37. https://doi.org/10.1177/074873099129000867.
Hirshkowitz, Max, Kaitlyn Whiton, Steven M Albert, Cathy Alessi, Oliviero Bruni, Lydia DonCarlos, Nancy Hazen, et al. 2015. “National Sleep Foundation’s Sleep Time Duration Recommendations: Methodology and Results Summary.” Sleep Health, 4.
Skotte, Jørgen, Mette Korshøj, Jesper Kristiansen, Christiana Hanisch, and Andreas Holtermann. 2014. “Detection of Physical Activity Types Using Triaxial Accelerometers.” Journal of Physical Activity and Health 11 (1): 76–84. https://doi.org/10.1123/jpah.2011-0347.
Walch, Olivia, Yitong Huang, Daniel Forger, and Cathy Goldstein. 2019. “Sleep Stage Prediction with Raw Acceleration and Photoplethysmography Heart Rate Data Derived from a Consumer Wearable Device.” Sleep 42 (12): zsz180. https://doi.org/10.1093/sleep/zsz180.